AITopics | Western Province

Collaborating Authors

Western Province

Building low-resource African language corpora: A case study of Kidawida, Kalenjin and Dholuo

Mbogho, Audrey, Awuor, Quin, Kipkebut, Andrew, Wanzare, Lilian, Oloo, Vivian

arXiv.org Artificial IntelligenceJan-19-2025

Natural Language Processing is a crucial frontier in artificial intelligence, with broad applications in many areas, including public health, agriculture, education, and commerce. However, due to the lack of substantial linguistic resources, many African languages remain underrepresented in this digital transformation. This paper presents a case study on the development of linguistic corpora for three under-resourced Kenyan languages, Kidaw'ida, Kalenjin, and Dholuo, with the aim of advancing natural language processing and linguistic research in African communities. Our project, which lasted one year, employed a selective crowd-sourcing methodology to collect text and speech data from native speakers of these languages. Data collection involved (1) recording conversations and translation of the resulting text into Kiswahili, thereby creating parallel corpora, and (2) reading and recording written texts to generate speech corpora. We made these resources freely accessible via open-research platforms, namely Zenodo for the parallel text corpora and Mozilla Common Voice for the speech datasets, thus facilitating ongoing contributions and access for developers to train models and develop Natural Language Processing applications. The project demonstrates how grassroots efforts in corpus building can support the inclusion of African languages in artificial intelligence innovations. In addition to filling resource gaps, these corpora are vital in promoting linguistic diversity and empowering local communities by enabling Natural Language Processing applications tailored to their needs. As African countries like Kenya increasingly embrace digital transformation, developing indigenous language resources becomes essential for inclusive growth. We encourage continued collaboration from native speakers and developers to expand and utilize these corpora.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2501.11003

Country:

Africa > South Sudan (0.14)
Africa > Uganda (0.05)
North America > United States (0.04)
(17 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.67)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Uchaguzi-2022: A Dataset of Citizen Reports on the 2022 Kenyan Election

Mondini, Roberto, Kotonya, Neema, Logan, Robert L. IV, Olson, Elizabeth M, Lungati, Angela Oduor, Odongo, Daniel Duke, Ombasa, Tim, Lamba, Hemank, Cahill, Aoife, Tetreault, Joel R., Jaimes, Alejandro

arXiv.org Artificial IntelligenceDec-17-2024

Online reporting platforms have enabled citizens around the world to collectively share their opinions and report in real time on events impacting their local communities. Systematically organizing (e.g., categorizing by attributes) and geotagging large amounts of crowdsourced information is crucial to ensuring that accurate and meaningful insights can be drawn from this data and used by policy makers to bring about positive change. These tasks, however, typically require extensive manual annotation efforts. In this paper we present Uchaguzi-2022, a dataset of 14k categorized and geotagged citizen reports related to the 2022 Kenyan General Election containing mentions of election-related issues such as official misconduct, vote count irregularities, and acts of violence. We use this dataset to investigate whether language models can assist in scalably categorizing and geotagging reports, thus highlighting its potential application in the AI for Social Good space.

computational linguistic, dataset, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2412.13098

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Africa > Kenya > Bomet County > Bomet (0.05)
(35 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Voting & Elections (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Bayesian Counterfactual Prediction Models for HIV Care Retention with Incomplete Outcome and Covariate Information

Oganisian, Arman, Hogan, Joseph, Sang, Edwin, DeLong, Allison, Mosong, Ben, Fraser, Hamish, Mwangi, Ann

arXiv.org Artificial IntelligenceOct-29-2024

Like many chronic diseases, human immunodeficiency virus (HIV) is managed over time at regular clinic visits. At each visit, patient features are assessed, treatments are prescribed, and a subsequent visit is scheduled. There is a need for data-driven methods for both predicting retention and recommending scheduling decisions that optimize retention. Prediction models can be useful for estimating retention rates across a range of scheduling options. However, training such models with electronic health records (EHR) involves several complexities. First, formal causal inference methods are needed to adjust for observed confounding when estimating retention rates under counterfactual scheduling decisions. Second, competing events such as death preclude retention, while censoring events render retention missing. Third, inconsistent monitoring of features such as viral load and CD4 count lead to covariate missingness. This paper presents an all-in-one approach for both predicting HIV retention and optimizing scheduling while accounting for these complexities. We formulate and identify causal retention estimands in terms of potential return-time under a hypothetical scheduling decision. Flexible Bayesian approaches are used to model the observed return-time distribution while accounting for competing and censoring events and form posterior point and uncertainty estimates for these estimands. We address the urgent need for data-driven decision support in HIV care by applying our method to EHR from the Academic Model Providing Access to Healthcare (AMPATH) - a consortium of clinics that treat HIV in Western Kenya.

covariate, probability, scheduling decision, (16 more...)

arXiv.org Artificial Intelligence

2410.22481

Country:

Africa > Kenya > Western Province (0.24)
Africa > Kenya > Trans-Nzoia County > Kitale (0.04)
Africa > South Africa (0.04)
(4 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Detection of Malaria Vector Breeding Habitats using Topographic Models

Jadhav, Aishwarya

arXiv.org Artificial IntelligenceJul-16-2024

Treatment of stagnant water bodies that act as a breeding site for malarial vectors is a fundamental step in most malaria elimination campaigns. However, identification of such water bodies over large areas is expensive, labour-intensive and time-consuming and hence, challenging in countries with limited resources. Practical models that can efficiently locate water bodies can target the limited resources by greatly reducing the area that needs to be scanned by the field workers. To this end, we propose a practical topographic model based on easily available, global, high-resolution DEM data to predict locations of potential vector-breeding water sites. We surveyed the Obuasi region of Ghana to assess the impact of various topographic features on different types of water bodies and uncover the features that significantly influence the formation of aquatic habitats. We further evaluate the effectiveness of multiple models. Our best model significantly outperforms earlier attempts that employ topographic variables for detection of small water sites, even the ones that utilize additional satellite imagery data and demonstrates robustness across different settings.

dataset, topographic feature, water body, (13 more...)

arXiv.org Artificial Intelligence

2011.13714

Country:

North America > United States (0.29)
Africa > Ghana (0.25)
Africa > Kenya > Western Province (0.05)
(3 more...)

Genre: Research Report > Experimental Study (0.48)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases > Vector-Borne Disease (0.65)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for Drought

Akanbi, A

arXiv.org Artificial IntelligenceMay-17-2024

Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.

accurate knowledge representation, environmental monitoring domain, lightweight ontology representation, (17 more...)

arXiv.org Artificial Intelligence

2405.10713

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.13)
Africa > Sub-Saharan Africa (0.04)
(41 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Personal (0.92)

Industry:

Health & Medicine (1.00)
Government (1.00)
Food & Agriculture > Agriculture (1.00)
(3 more...)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
(4 more...)

Add feedback

Rural Kenyans power West's AI revolution. Now they want more

Al JazeeraFeb-3-2024, 08:30:14 GMT

Naivasha, Kenya – Caroline Njau comes from a family of farmers who tend to fields of maize, wheat, and potatoes in the hilly terrain near Nyahururu, 180 kilometres (112 miles) north of the capital Nairobi. But Njau has chosen a different path in life. Seated in her living room with a cup of milk tea, she labels data for artificial intelligence (AI) companies abroad on an app. The sun rises over the unpaved streets of her neighbourhood as she flicks through images of tarmac roads, intersections and sidewalks on her smartphone while carefully drawing boxes around various objects; traffic lights, cars, pedestrians, and signposts. The designer of the app – an American subcontractor to Silicon Valley companies – pays her 3 an hour.

cheruyot, kenya, nairobi, (13 more...)

Al Jazeera

Country:

Africa > Kenya > Nairobi City County > Nairobi (0.29)
North America > United States > California (0.26)
Africa > South Africa (0.06)
(9 more...)

Industry:

Information Technology (0.71)
Transportation > Ground > Road (0.70)
Transportation > Infrastructure & Services (0.55)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.32)

Add feedback

Causal Machine Learning for Cost-Effective Allocation of Development Aid

Kuzmanovic, Milan, Frauen, Dennis, Hatt, Tobias, Feuerriegel, Stefan

arXiv.org Artificial IntelligenceJan-31-2024

The Sustainable Development Goals (SDGs) of the United Nations provide a blueprint of a better future by 'leaving no one behind', and, to achieve the SDGs by 2030, poor countries require immense volumes of development aid. In this paper, we develop a causal machine learning framework for predicting heterogeneous treatment effects of aid disbursements to inform effective aid allocation. Specifically, our framework comprises three components: (i) a balancing autoencoder that uses representation learning to embed high-dimensional country characteristics while addressing treatment selection bias; (ii) a counterfactual generator to compute counterfactual outcomes for varying aid volumes to address small sample-size settings; and (iii) an inference model that is used to predict heterogeneous treatment-response curves. We demonstrate the effectiveness of our framework using data with official development aid earmarked to end HIV/AIDS in 105 countries, amounting to more than USD 5.2 billion. For this, we first show that our framework successfully computes heterogeneous treatment-response curves using semi-synthetic data. Then, we demonstrate our framework using real-world HIV data. Our framework points to large opportunities for a more effective aid allocation, suggesting that the total number of new HIV infections could be reduced by up to 3.3% (~50,000 cases) compared to the current allocation practice.

covariate, development aid, treatment effect, (14 more...)

arXiv.org Artificial Intelligence

2401.16986

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Africa > Mozambique (0.04)
Asia > India (0.04)
(101 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Kencorpus: A Kenyan Language Corpus of Swahili, Dholuo and Luhya for Natural Language Processing Tasks

Wanjawa, Barack, Wanzare, Lilian, Indede, Florence, McOnyango, Owen, Ombui, Edward, Muchemi, Lawrence

arXiv.org Artificial IntelligenceJul-8-2023

Indigenous African languages are categorized as under-served in Natural Language Processing. They therefore experience poor digital inclusivity and information access. The processing challenge with such languages has been how to use machine learning and deep learning models without the requisite data. The Kencorpus project intends to bridge this gap by collecting and storing text and speech data that is good enough for data-driven solutions in applications such as machine translation, question answering and transcription in multilingual communities. The Kencorpus dataset is a text and speech corpus for three languages predominantly spoken in Kenya: Swahili, Dholuo and Luhya. Data collection was done by researchers from communities, schools, media, and publishers. The Kencorpus' dataset has a collection of 5,594 items - 4,442 texts (5.6M words) and 1,152 speech files (177hrs). Based on this data, Part of Speech tagging sets for Dholuo and Luhya (50,000 and 93,000 words respectively) were developed. We developed 7,537 Question-Answer pairs for Swahili and created a text translation set of 13,400 sentences from Dholuo and Luhya into Swahili. The datasets are useful for downstream machine learning tasks such as model training and translation. We also developed two proof of concept systems: for Kiswahili speech-to-text and machine learning system for Question Answering task, with results of 18.87% word error rate and 80% Exact Match (EM) respectively. These initial results give great promise to the usability of Kencorpus to the machine learning community. Kencorpus is one of few public domain corpora for these three low resource languages and forms a basis of learning and sharing experiences for similar works especially for low resource languages.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2208.12081

Country:

Africa > East Africa (0.14)
Africa > Kenya > Nairobi City County > Nairobi (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(20 more...)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Media > News (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(2 more...)

Add feedback

Adaptive Interventions for Global Health: A Case Study of Malaria

Periáñez, África, Trister, Andrew, Nekkar, Madhav, del Río, Ana Fernández, Alonso, Pedro L.

arXiv.org Artificial IntelligenceMar-17-2023

Malaria can be prevented, diagnosed, and treated; however, every year, there are more than 200 million cases and 200.000 preventable deaths. Malaria remains a pressing public health concern in low- and middle-income countries, especially in sub-Saharan Africa. We describe how by means of mobile health applications, machine-learning-based adaptive interventions can strengthen malaria surveillance and treatment adherence, increase testing, measure provider skills and quality of care, improve public health by supporting front-line workers and patients (e.g., by capacity building and encouraging behavioral changes, like using bed nets), reduce test stockouts in pharmacies and clinics and informing public health for policy intervention.

data mining, intervention, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2303.02075

Country:

Africa > Sub-Saharan Africa (0.24)
Africa > Malawi (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(20 more...)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Mobile (0.94)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

The Role of Digital Agriculture in Transforming Rural Areas into Smart Villages

Chowdhury, Mohammad Raziuddin, Sourav, Md Sakib Ullah, Sulaiman, Rejwan Bin

arXiv.org Artificial IntelligenceJan-7-2023

From the perspective of any nation, rural areas generally present a comparable set of problems, such as a lack of proper health care, education, living conditions, wages, and market opportunities. Some nations have created and developed the concept of smart villages during the previous few decades, which effectively addresses these issues. The landscape of traditional agriculture has been radically altered by digital agriculture, which has also had a positive economic impact on farmers and those who live in rural regions by ensuring an increase in agricultural production. We explored current issues in rural areas, and the consequences of smart village applications, and then illustrate our concept of smart village from recent examples of how emerging digital agriculture trends contribute to improving agricultural production in this chapter.

agriculture, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2301.10012

Country:

Europe > Poland (0.04)
Europe > Switzerland (0.04)
Europe > Finland (0.04)
(31 more...)

Genre:

Overview (0.87)
Research Report (0.64)

Industry:

Health & Medicine (1.00)
Materials > Chemicals > Agricultural Chemicals (0.30)
Food & Agriculture > Agriculture > Pest Control (0.30)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback